System and method for analyzing data

ABSTRACT

The present invention provides a method and system for analyzing data that characterizing incoming data. The incoming data is analyzed and processed to provide scores. The received scores are further processed to provide a final score by processing the incoming data record with internal databases catalog to provide a final score that is a final diagnosis of the method and system of the present invention. In a preferred embodiment of the present invention the incoming data are information units regarding customer behavior that are analyzed to characterize customers for retention purposes.

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority from PCT Application No. PCT/IL01/01074, filed Nov. 21, 2001, and Israeli Patent Application No. 146597, filed Nov. 20, 2001, each of which is hereby incorporated by reference as if fully set forth herein.

BACKGROUND OF THE INVENTION

[0002] The present invention generally relates to a system and method for detecting and analyzing of information units. More specifically, the present invention relates to the detection and analysis of data records by selecting particular aspects of the data record.

[0003] Computerized systematic data analysis of information units is performed today mainly by using decision trees. Decision trees within the computerized systems utilize specifically designated data stored in system databases to categorize information units received as input data. The specifically designated data used for categorizing the information units is based on assumptions such as statistics or specific requirements of the system. One example of utilizing such systems is the analysis performed for fraud detection in credit card transactions. The fraud detection analysis system can detect anomalous transactions according to designated data. A transaction associated with the performance of a purchase for a sum that exceeds substantially from the “normal” designated sum will generate an alert, a warning or provide suitable instructions to supervising routines of the system or to a specific user. Nevertheless, the designated stored control data utilized for the generation of the indication for fraudulent transactions has limitations. The source of the limitations of designated data is the inaccuracy that derives from the inherent nature of such data, that attempts to predict the future with knowledge gained in the past, and from difficulty to characterize a credit card holder's “normal” behavior. The difficulty to characterize a credit card holder's behavior originates with a wide variety of factors that influence a person's behavior, such as religion, seasons of the year, family status, ethnic origin, and the like. Other difficulty for providing accurate information regards credit card holders that do not have a simple pattern of transaction performance. U.S. Pat. No. 5,819,226 discloses a prior art system known in the field of fraudulent behavior detection. The patent provides an automated system and method for detecting fraudulent transactions using the neural network method as a predictive model. The neural network model “learns” a pattern that it can later identify. The learning process is based on a given number of iterations executed by the neural network based detection system. However, the ability of a fraudulent detection system based upon a neural network system is substantially limited and all too often provides false diagnosis of transactions as fraudulent. The principal reason for providing false diagnosis is related to the manner in which the neural networks method operates. The neural network method ability within a fraudulent detection system is limited as it learns the pattern of a single customer, credit card holder, or a group of customers, and their fraudulent behavior pattern and produces a score based on the “learned” patterns. Consequently, the neural network provides a large amount of false recognitions, such as identifying not fraudulent credit card transactions as fraudulent. The limitations of the neural networks are due to their disability to deal with “trouble making” customers who have an erratic behavior pattern and do not have a simple pattern.

[0004] Therefore, there is an urgent need to introduce a method and system that will provide accurate information regarding the input information units. There is also an urgent need for improved customer retention applications. The importance of customer retention applications within modern commerce is considerable. Business executives as well as commercial retail outlet owners are keenly aware that in order to retain their customers a substantially automatic learning process must be performed. There is a need for a method and system that will provide useful information received into a system that is operative in the processing and analyzing customer behavior and will provide the characteristics of customers concerning their dealings with the specific businesses. The system and method proposed by the present invention provides information related to customers behavior by analyzing concurrently various fields within information units containing diverse types of data.

SUMMARY OF THE INVENTION

[0005] The present invention relates to a computing environment accommodating at least one input device connectable to at least one server device connectable to at least one output device, a method of processing at least one information unit introduced by the at least one input device by the at least one server device to create at least one information score based on the at least one information unit, the method comprising the steps of: creating at least one complexity catalog based on the at least one information unit, and establishing at least one score unit based on the at least one complexity catalog, and establishing scores for assessment.

[0006] The method above mentioned further comprising the steps of: obtaining at least one information unit from the at least one input device by the at least one server device, and displaying the at least one scoring unit. The information unit processed within the present invention can include data about an individual behavior such as a customer or a group of customers.

[0007] The present invention computing environment accommodating at least one input device connected to at least one server device having at least one output device, a system for the processing at least one information unit introduced via the at least one input device by the at least one server device to process at least one information unit based, the system comprising the elements of: an infrastructure server device to create at least one complexity catalog, and a complexity catalog to hold at least one list of ordered complexity values associated with the partitioned sub-unit blocks, and an application server to build at least one information summary unit based on the at least one information unit and on at least one associated complexity catalog, and a scoring component to provide scores.

BRIEF DESCRIPTION OF THE DRAWINGS

[0008] The present invention will be understood and appreciated more fully from the following detailed description taken in conjunction with the drawings in which:

[0009]FIG. 1 is a schematic block diagram of a system environment of the preferred embodiment of the present invention; and

[0010]FIG. 2 is a schematic block diagram of the information detection and analysis system of the preferred embodiment of the present invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0011] Preferred embodiments will now be described with reference to the drawings. For clarity of description, any element numeral in one figure will represent the same element if used in any other figure.

[0012] The present invention provides a method and system for analysis of data (SAD) by executing complexity calculations of a set of data fields, enabling segmentation of data, providing scores to the separated segmentations and providing final score of clusters of the analyzed segmentations of data. The SAD can be used in a variety of applications such as customer retention, marketing, analysis of credit card transaction and the like. In one preferred embodiment of the present invention the SAD aids a businesses enterprise in customer retention. In contrast the state of art presently is such that customer retention systems mostly include analysis of performance measures, such as quantitative metrics and formulas, for detecting potential performance problems. The measures used include amongst others: retention rate, loyalty index and satisfaction index. These measures are analyzed along the time axis and other customer dimensions such as: customer category, market geography segment, and the like. The customer retention method and system proposed by the preferred embodiment of the present invention includes a complexity calculation tool that attacks the customer retention problem from a different, yet complimentary angle. It gives a numerical parameter to the complexity of the pattern, i.e., whether it is simple and monotonic or erratic and unpredictable, and alerts whenever this parameter changes, e.g. when a transaction of a simple behaving account deviates from its monotonic behavior, or when a monotonic and recurring transactions appear in an erratic account. The customer retention system and method proposed by the preferred embodiment of the present invention provides an end result with a final score. The final score provides analysis that indicates whether a specific customer is about to be lost to the business.

[0013] The new addition to the prior art in business intelligence is divided into two major components: a parameterization of the data by calculating the complexity of a given record, and an analysis method of data records by calculating the complexity of one or more dimensions of the data in a subject oriented way. The parameterization of data enables to attach meaning to a parameter (e.g. high complexity value means an erratic behavior and an unstable customer, low complexity value means a monotonic behavior and a stable customer). Segmentation of data according to a parameter, a better classification can be made (e.g. all the high complexity customers are grouped together in an “erratic behavior group”, while the low complexity customers are in the “monotonic group”). The records are classified according to their behavior and not according to a predetermined property (e.g. demographics). After stratifying the records according to their complexity parameter, further enhanced analysis will be provided by the utilization of additional. For example, analysis by neural networks per cluster of records with a similar complexity parameter will produce a better result and a higher prediction rate than by analyzing the entire set of records together or by a predefined property. The similar complexity groups are better suited for the neural network characteristics than most other types of grouping. Analysis by additional tools (e.g. decision trees and the like) for each cluster of similar complexity records will produce better results for prediction, analysis and understanding of the data.

[0014] The analysis method of data records by calculating the complexity of the data in a subject oriented way enables the selection of the fields of interest and the calculation of the other fields' complexity, thus showing the behavior of the selected fields. For example, in CRM data records, the Customer field can be selected and the complexity of the “Time of Call”, “Length of Call” and “Transaction Made” fields can be calculated, thereby analyzing the behavior of all the customers is accomplished. This innovative analysis method will show whether they always make the same transaction in a short call (low complexity) or call in different times of the day, making all kinds of transaction (high complexity). The analysis provided by the present invention provides the possibility to detect changes of patterns within specific field or combination thereof within an information unit. Thus, as the preferred embodiment is within the field of customer retention, the behavior or a change of behavior of a customer or group of customers over the time axis can be detected.

[0015] For example, by calculating the complexity of all the agents, in CRM data records, this analysis can alert when an agent has deviated from his normal behavior (i.e. when his complexity has changed from previous complexity calculations). Thus, for example, the supervisor can be alerted that the specific agent has changed his behavior.

[0016] By analyzing the complexity of all the fields (e.g. in CRM customers, agents, transactions, etc.), at different times, the new analysis method can detect a change in behavior and generate an alert concerning the change.

[0017] The preferred embodiment will be better understood by relating to FIG. 1 that illustrates the environment of the SAD 18. The SAD 18 receives data input from users 10, 12, and 14 via the data communication network 20 (DCN). Users 10, 12 and 14 can be individuals forwarding information units regarding particular customers, businesses forwarding information regarding of all their customers behavior or a transmitting center or agent transmitting information regarding customers behavior from one or more locations. The DCN 20 can be the Internet, LAN, WAN, a satellite communication network and the like. The most common DCN 20 used is the standard telephone system (POTS) that enables communication via ordinary phone connection lines.

[0018] Referring now to FIG. 2 the SAD 18 includes an input device 56, a communication device 54, an output device 58 and an analysis and evaluation server platform 22. The input device 56 can be a pointing device, a keyboard device or the like. The output device 58 can be a printer, a screen display or the like. The communication device 54 can be a modem, a network interface card or any other suitable communication devices providing transmission and reception of data via DCN 20 of FIG. 1. according to the preferred embodiment of the present invention the analysis and evaluation server platform 22 includes a processor device 24, and a memory device 26. The processor device 24 is the logic unit designed to perform arithmetic and logic operations by responding to and processing the basic instructions driving the computing device. The processor device 24 can be one of the Intel Pentium series, the PowerPC series, the K6 series, the Celeron, the Athlon, the Duron, the Alpha, or the like. The memory device 26 includes a reference transaction database 28, an operating system 30, a control database 32, a complexity database catalog 36 and an application server 38. The reference transaction database 28 includes database information including a list of customers, personal information regarding customers, history files containing customer behavior and other relevant information related to credit card holders and agents. The reference transaction database 28 can be located within the SAD 18 as illustrated in FIG. 2 or in any other separate location. The operating system 30 is responsible for managing the operation of the entire set of software programs implemented in the operation of the SAD 18. The operating system 30 can be of any known operating system such as Windows NT, Windows XP, UNIX, Linux, VMS, OS/400, AIX, OS X and the like. The complexity database catalog 36 includes all the complexity values assigned to the records processed by the complexity engine 52. The complexity values stored within the complexity database catalog 36 are discussed in detail in the pending PCT application PCT/IL01/01074 incorporated herein by reference. The control database 32 controls the input data received by the input device 56 and the transfer thereof to the application server 38. The control database 32 also directs the movement of the data from the reference transaction database 28 to the application server 38 and to the complexity database catalog 36 from the application server 38. The application server 38 within the preferred embodiment includes a complexity catalog handler 40, a scoring component 42, a learning component 44, a database handler 46, a resource allocation component 48, a user interface component 50 and a complexity engine 52. The complexity catalog handler 40 is responsible for the obtaining the appropriate complexity metrics records created by the application server 38 from the complexity database catalog 36. The resource allocation component 48 is responsible for allocating variable resources to the processing of the separate records in accordance with the complexity metrics thereof. The user interface component 50 is a set of specifically designed and developed front-end programs. The component 50 allows the user of the system to interact dynamically with the system by performing a set of predefined procedures operative to the running of the method. Via the component 50 the user could select an application, as selected for the SAD customer retention purposes, activate the selected application, adjust specific processing parameters, select sets of records for processing according to the complexity metrics thereof, and the like. The component 50 could be developed as a plug-in to any of the known user interfaces. The component 50 will be preferably a Graphical User Interface (GUI) but any other manner of interfacing with the user could be used such as a command-driven interface, a menu-driven interface or the like. The database handler 46 receives the input data records from the control database 32 and provides the records to the complexity catalog handler 40. The database handler 46 further receives complexity values and scores provided to data records from the complexity catalog handler 40 and provides the control database 32 that provides the complexity database catalog 36 and reference transaction database 28 with the complexity values and scores regarding to data records. The learning component 44 provides mechanism for matching a given input such as the complexity vectors for each transaction to a given output such as a customer behavior or deviation of ordinary customer behavior. The learning component 44 provides the scoring component 42 with different scores that are than processed within the scoring component 42. The complexity engine 52 provides complexity values to data records received from the control database 32 within the application server 38 and handled by the database handler 46.

[0019] A person skilled in the art will appreciate that what has been shown is not limited to the description above. Those skilled in the art to which this invention pertains will appreciate many modifications and other embodiments of the invention. It will be apparent that the present invention is not limited to the specific embodiments disclosed and those modifications and other embodiments are intended to be included within the scope of the invention. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation. The invention, therefore, is not to be restricted except in the spirit of the claims that follow. 

What is claimed is:
 1. In a computing environment accommodating at least one input device connectable to at least one server device connectable to at least one output device, a method of processing at least one information unit introduced by the at least one input device by the at least one server device to create at least one information score based on the at least one information unit, the method comprising the steps of: creating at least one complexity catalog based on the at least one information unit; and establishing at least one score unit based on the at least one complexity catalog; and establishing scores for analyzing data.
 2. The method of claim 1 further comprising the steps of: obtaining at least one information unit from the at least one input device by the at least one server device; and displaying at least one scoring unit.
 3. The method of claim 1 whereas the information unit contains information about customer behavior.
 4. In a computing environment accommodating at least one input device connected to at least one server device having at least one output device, a system for the processing at least one information unit introduced via the at least one input device by the at least one server device to process at least one information unit based, the system comprising the elements of: an infrastructure server device to create at least one complexity catalog; and a complexity catalog to hold at least one list of ordered complexity values associated with the partitioned sub-unit blocks; and an application server to build at least one information summary unit based on the at least one information unit and on at least one associated complexity catalog; and a scoring component to provide scores. 